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Abstract 

A framework for categorizing constructed-response items was 
developed in which items were ordered on a continuum from 
multiple-choice to presentation/performance according to the 
degree of constraint placed on the examinee's response. Two 
investigations were carried out to evaluate the validity of this 
framework. In the first investigation, 27 test development staff 
assigned 46 items of various formats to the categories. Overall, 
agreement with the intended item categorizations was good, with a 
median of two of a possible 27 judges disagreeing with a given 
item's classification. In the second investigation, responses of 
40 examinees each to four sets of items were scored by test 
development staff, with each set scored by four individuals. 
Results showed scoring agreement to be highest for a category 
requiring the examinee to choose a response from an extended 
£,timulus array and lowest for items requiring that the stimulus 
be reordered to form a correct sequence. Whether the reported 
agreement levels represent sufficient accuracy to permit the 
widespread use of such items in standardized tests depends on 
whether some degree of scoring error, however small, can be 
accepted. 
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Toward a Framework for Constructed-Response Items 
The multiple-choice item has been and remains the mainstay 
of large-scale testing programs in the United States. There are 
several reasons that support this choice. First, compared with 
item types requiring judgmental keying, scoring is objective and 
reliable. Moreover, test scoring can be automated and thus can 
be inexpensive and swift. Third, relative to some other formats, 
items can be answered very rapidly. This means that, within a 
limited period, it is possible to obtain the broad content 
sampling necessary to assure that a test provides a reliable and 
generalizable representation of a domain. Finally, a 
sophisticated statistical technology has been built to support 
the analysis of these items (e.g.. Lord, 1980). 

Whereas multiple-choice items have important advantages, 
they also have significant limitations (N. Frederiksen, 1984) . 

For one, multiple-choice items are more easily used to test 
specific, isolated pieces of knowledge than to measure higher- 
order skills, such as problem solving, in the real-world contexts 
in which they are normally used. Second, these items can be 
answered correctly, with relatively high probability, by 
guessing. Guessing introduces error into the measurement of 
performance, particularly for low ability examinees and for 
difficult tests. Third, unless items are very carefully 
constructed, this item type is susc'eptible to coaching based on 
strategies that deal with superficial characteristics of the item 
rather than the examinee's knowledge of the content the item is 
intended to assess. Finally, as usually constructed and scored. 
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the multiple-choice item does not provide for the assessment of 
partial knowledge or for the identification of diagnostic 
information concerning the source of an examinee's errors. While 
there are techniques by which this limitation can be overcome 
(e.g., Coombs scoring to assess partial knowledge), other item 
formats may be more suitable for the attainment of these 
objectives. 

The limitations of the multiple-choice format become 
particularly evident when viewed in the context of recent 
pressures for educational reform (Fiske, 1990; J. R. Frederiksen 
& Collins, 1989). In this context, tests are expected to (1) 
emphasize higher order processes so that problem-solving skills 
will be more rapidly incorporated in curricula, (2) facilitate 
instruction by identifying specific skills individual learners 
have yet to master, and (3) measure the outcomes of curriculum 
reform efforts intended to enhance higher-order skills. 

Various item formats retain the amenability to machine 
scoring of the multiple-choice item while ameliorating some of 
its less desirable features. Carlson (1985) describes a number 
of these formats. Particularly attractive are variations of the 
keylist or master-list item. In one version, a set of item stems 
is presented with a common list of possible responses; correct 
responses to one stem serve as distractors for the others. This 
format eliminates the need to create plausible distractors for 
each stem. The use of a relatively long list of possible 
responses can reduce the probability of correct guesses and of 
"gamesmanship" strategies for choosing correct answers. The 
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format can also allow for multiple correct responses to a 
question in order, for example, to accommodate regional 
differences in terminology. 

Whereas item, types like the keylist can increase the 
flexibility with which assessment is performed, they are not 
sufficiently open for some assessment purposes. The limitation 
to items in which the examinee is to recognize a correct option 
is artificial; some real-world situations have this character, 
but others require that an individual generate solutions to a 
problem without being presented with the alternatives (Ward, N. 
Frederiksen, & Carlson, 1980) . Still others require that the 
individual identify the problem, rather than address a problem 
posed by someone else. Tests that mirror these characteristics 
of skilled or intelligent performance are needed to provide valid 
representation of the range of skills for which assessment is 
desired (Nickerson, 1989; J. R. Frederiksen & Collins, 1989). 

Such measures are particularly relevant when the interest is in 
assessing higher-order skills — the ability to acquire, organize, 
and apply knowledge and strategies — rather than the simple 
possession of information or algorithms. 

Over the past decade, educational researchers in reading, 
writing, and mathematics have increasingly emphasized the need 
for instruction in the thinking and problem solving skills 
required for competence, as opposed to a concentration on 
mechanics and errors. If for no other reasons than to secure 
credibility and face validity, assessors may need to provide 
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instruments that involve significant productivity on the part of 
examinees. 

With the prospect of changing needs and uses for test 
information, and with the increasing availability of technologies 
that can facilitate the scoring of more complex responses 
(Bennett, in press) , it is appropriate to rethink our dependence 
on the multiple-choice item and consider the advantages and 
limitations of potential alternatives. That process should be 
aided by a framework for organizing item types. Such a scheme 
should help identify relevant item characteristics, suggest 
research questions, aid in organizing research results, and 
perhaps stimulate new development directions. 

This paper presents the beginnings of such a framework by 
describing an initial set of item categories intended to capture 
the range of constructed-response item types. Also reported are 
empirical analyses of the consistency of judges' classifications 
using these categories and of the relationship between category 
membership and scoring reliability. 

A Preliminary Framework 

Figure 1 depicts a categorization of item types according to 
the task presented, where the main variant is the extent of 
openness allowed in the response. The categories, which were 
constructed from a reviev/ of individually and group— administered 
achievement and ability test items, are intended to represent 
discernible points along this "openness" continuum. Seven 
categories are listed from more to less constrained: multiple- 

choice, selection/ identification, reordering/rearrangement. 
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substitution/correction, completion, construction, and 
presentation/performance . 



Insert Figure 1 about here 



To be of minimum utility, such a categorization must have at 
least two characteristics. First, classifications of items into 
the intended categories must be consistently made by any 
reasonable judge. Second, categorizations must be associated 
with item attributes deemed important to the measurement process. 
One such property is scoring objectivity. At the extremes, the 
objective nature of multiple-choice items is well established 
whereas experience suggests that presentation/performance tasks 
are more difficult to grade reliably. 

This report presents data on both characteristics of the 
categorization. Specifically, the concern was with (1) whether 
independent judges agreed among themselves in placing items into 
the intended categories and (2) whether judges could score items 
from different categories with equal accuracy. 

Method 

Judges 

Judges were two groups of ETS test development staff 
experienced in the construction of verbal tests. For the first 
part of the study (assessing the consistency of classifications) , 
fifty-three test developers were asked to participate and 27 
returned responses. Of those participating, the mean number of 
years of test development experience was 10.0, with a standard 
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deviation of 6.4. Just over half of these individuals (52%) 
reported as their highest degree a master's with most of the 
remainder (40%) holding the doctorate. Most individuals (68%) 
indicated that the humanities constituted their major field of 
study, with all but one other majoring in the social sciences. 

For the second part (assessing scoring objectivity) , 16 test 
developers were sought. The 16 who agreed to participate had a 
mean of 9.8 years of test development experience (standard 
deviation = 7.1). Nine reported as their highest degree a 
master's with all but one of the rest holding the doctorate. 
Again, most (12 of 16) indicated that the humanities constituted 
their major field of study, the others having had their formal 
education in the social sciences. 

Procedure 

A third group of nine test developers experienced in the 
generation of verbal tests was asked to construct multiple items 
conforming to the specifications for categories 0-5 above, with 
more than one item subtype represented in each category. Each 
developer was asked to write items that measured sentence-level 
verbal skills (e.g. , grammatical ity and style, semantic 
processing) at a level appropriate to college freshmen. Because 
these skills could not easily be represented using presentation/ 
performance items, this category was dropped from the empirical 
investigation. Developers were directed to write items 
measuring, to the extent possible, similar content within and 
across item categories, with the ideal result being items 
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distinguishable from one another primarily on the basis of 
, response format. 

In all, 46 items were developed. Because the category 
definitions given test developers left room for interpretation, 
the items did not in all cases fit the categorizations intended 
by the authors. For this reason, several items were reassigned. 
The resulting distributions ranged from 6 to 9 items per category 
(see Appendix A) . 

Item classification . To explore the consistency and 
correctness of item classifications, test developers not involved 
in the construction of the items (the first group of judges 
described above) were given the category specifications and asked 
to classify each item without knowing to which category the item 
belonged. Consistency was estimated using the models and methods 
of generalizability theory (Cronbach, Gleser, Nanda, & 

Rajaratnam, 1972) . In generalizability theory, variances 
associated with the different components contributing to the 
total variation in a set of test scores, or ratings, are 
estimated. These variance components are assigned to true or 
error variance depending upon the purpose of the measurement 
procedure . 

A three-way analysis of variance was used to estimate the 
variance components of the following mixed model; 

Yijk = + 5k(i) + a/3ij +)3Ijk(i) 

where Yijk is the ordered classification assigned by the j.th judge 
to the ]cth item in the ith category, a is the category effect, a 
fixed facet representing the complete population of categories. 
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and judge effect, and I, item effect, are random facets 
presumed to be sampled from infinite populations of judges and 
items, respectively. The data analyzed were the classifications 
assigned by each of the 27 judges for six items sampled from each 
of the six categories. 

For this analysis, category was considered to represent true 
variance. Allocated to observed variance was, in addition to 
category, variance due to judges, to items within categories, to 
the category-by- judge interaction, and to the item-by- judge 
within category interaction. A generalizability coefficient for 
a single judge was computed by dividing the true variance 
estimate by the observed variance as per Thorndike (1982, p. 166- 
167). 

To explore the correctness of item classifications, several 
analyses were conducted using all 46 items. First, the product- 
moment correlation between each judge's classifications and the 
item's intended category classification was computed and these 
correlations averaged using the Fisher r-to-z transformation. 

Because this analysis is sensitive only to the extent to 
which judges order the items similarly to the intended ordering, 
and not whether the category placements themselves are correct, 
the difference between each judge's categorization for an item 
and the intended categorization was computed and averaged across 
items. This mean signed difference indicates the extent to which 
the judge misclassif ied items, even if the ordering was similar 
to the intended one, and in what direction the judge's 
classifications diverged. 
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Finally, the number of judges diverging from the intended 
categorization for each item was computed and averaged across 
items within a category to identify which categories were the 
most difficult to classify. The frequency of disagreements for 
each item was also examined to identify problematic subtypes. 

Scoring reliability . To evaluate the relationship between 
category membership and scoring reliability, responses to 42 of 
the 46 items were collected from student volunteers attending 
Bunker Hill Community College (MA) , Central Piedmont Community 
College (NC) , the College of the Desert (CA) , Santa Fe Community 
College (NM) , and Lewis and Clark Community College (IL) . (The 
four essays were eliminated to keep the test from consuming an 
inordinate amount of examinee time.) The 42 items were divided 
into four approximately parallel forms (A-D) , of 10-11 items and 
each form was administered to a random quarter of the students at 
each college. From the 212 completed student tests, 160 were 
randomly chosen and divided into four sets of 40. Each set of 40 
tests was given to a different group of four raters for scoring. 
The raters independently scored the tests according to a scoring 
guide (see Appendix B) that was reviewed, pilot tested, and 
revised before use. 

For each item, the scores of the four raters for the 40 
examinees were subjected to variance components analysis using 
the following model: 

Yij = M + ’Ti + /3j + ’T/Jij 

where Yij is the item score assigned by the j.th judge to the ith 
examinee, tt is the person effect, and /3, the judge effect, with 
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both effects presumed to be random. Considered as error were the 
judge effect and the person-by-judge interaction. 

A generalizability coefficient for a single rating was 
computed for each item along with the median coefficient for each 
category. Emphasis in interpreting these coefficients was on 
their relative rather than absolute magnitude. This emphasis was 
chosen because, although care was taken in developing the scoring 
guide and communicating the task to judges, this experimental 
grading lacked the protections and motivations characteristic of 
operational free-response scorings, such as those conducted by 
the College Board's Advanced Placement Program (Jensen, 1987). 

To illustrate, raters in these sessions commonly spend almost as 
much time training as the total time available for our 
experimental scoring. Second, the guides developed are often 
critiqued and refined as part of this extensive training process. 
Third, the graders' performance is monitored in real time to 
detect and correct misapplications of the scoring guide. 

Finally, raters are motivated by the fact that their judgments 
might have an important impact on the examinee. As a 
consequence, this study likely underestimates the agreement 
levels that might be obtained under these more stringent, 
operational conditions. 

Because the categorization scheme orders items by the 
openness of the response and because greater openness is 
generally associated with lower scoring reliability, item 
category membership should be related to scoring reliability. To 
test this hypothesis, items were collapsed across forms and the 
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Pearson product-moment correlation between each item’s 
generalizability coefficient and its category membership was 
computed. The significance of this correlation was tested using 
a one-tailed t-test with alpha set at .05. 

Finally, to assess the extent to which multiple ratings 
might allow the generalizability coefficients of the constructed- 
response categories to approximate the agreement levels of 
multiple-choice, the single-rater coefficients described above 
were stepped up using a method suggested by Winer (1971, p. 287). 
Coefficients were generated for the mean ratings of two, three, 
and four judges. 

Results 

Item Classification 

Results of the variance components analysis are shown in 
Table 1. As can be seen, the largest variance estimate is for 
category, the object of classification, which is true variance. 
Among the error components, the overwhelming portion of variance 
is associated with the item-by-judge within category interaction. 
This interaction indicates that for any given category, the 
extent of agreement among judges differs as a function of the 
particular item being classified. The single-judge 
generalizability coefficient for these data is .95, suggesting 
that classifications can be reliably generalized across judges 
and items. 



Insert Table 1 about here 
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Table 2 presents the distribution of product-moment 
correlations between the intended item categorizations and the 
categorizations made by each judge for all 46 items. As the 
table shows, most judges were able to reproduce the rank ordering 
of the intended categorizations reasonably well: only four of 

the 27 correlations fell below .85 and the mean and median 
correlation were .94 and .93, respectively. 



Insert Table 2 about here 



To determine whether judges' classifications simply tended 
to duplicate the rank order of the intended classifications as 
opposed to the actual placements, the intended category 
designation for an item was subtracted from the judge's 
designation and these differences averaged across items. The 
distribution of these mean signed differences is presented as 
Table 3. As the table indicates, judges' categorizations 
deviated little from the intended ones: the mean and median of 

this distribution were .11 and .07, respectively, an average 
deviation of a fraction of a category per item per judge. 



Insert Table 3 about here 
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Examination of the average number of judges diverging from 
the intended item categorization — where the range of 
disagreements for an item is 0 to 27 — gives an indication of 
which categories were the most problematic. Median disagreements 
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were highest for the selection/identification category (Md = 6) 
and lowest for construction and reordering/rearrangement (Md = 0 
and . 5 , respectively) . The remaining three categories — 
completion, multiple-choice, and substitution/correction — fell in 
between with median disagreements of 2, 2.5, and 3, respectively. 

Shown in Table 4 are the number of judges diverging from 
each item's intended categorization. This table identifies what 
items stand out within categories as difficult to validly 
classify. The category with the largest median number of 
disagreements, selection/ identification, is represented by two 
item types, cloze elide and keylist. Neither type seems more 
prone to disagreement than the other. From a review of the 
judges' classifications, these items most frequently appeared to 
be confused with completion items (the case for ^-wo of the three 
keylists) or with substitution/ correction (the case for all five 
cloze elide questions) . Two of the three keylist items do, in 
fact, take a sentence completion format (but one that is followed 
by a list of alternative words to be used to complete the 
sentence), accounting for some judges' confusion. (This same 
confusion was evident for item #13, a multiple-choice question 
presented in a sentence-completion format.) The cloze elide 
items present passages which contain irrelevant or incorrect 
words that the examinee is asked to strike out (i.e., select). 

The confusion here seems to be that the examinee is correcting 
the passage, which makes the item appear superficially 
appropriate for the substitution/correction format. 
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Insert Table 4 about here 



Several other items warrant discussion because of the high 
levels of disagreement found for them. The item most difficult 
to classify belongs to the substitution/correction category. 

This "construction shift" task (item #41) requires the examinee 
to rewrite a sentence given a new beginning so that the surface 
structure is changed but the original meaning is preserved. This 
item was most often misclassif ied as reordering/rearrangement, an 
understandable choice given that the task appears to be to simply 
rearrange the stimulus sentence. However, the task is slightly 
more complex as the change in structure typically requires some 
modification of the original beyond rearrangement. This 
modification may include changes in verb forms or the addition of 
connectives. As a result, the task is arguably one of 
substitution, though some amount of rearrangement does play a 
part. 

The next two most disputed items were completion questions. 
The word insertion task (item #3) asked the examinee to insert 
words into an incomplete sentence to make the sentence logically 
and grammatically correct. Item #35 is conceptually similar, 
requiring incorporation of appropriate punctuation. In both 
cases, the items were most commonly mistaken for members of the 
substitution/correction category, probably because the stimuli 
were to be corrected and no blanks — which are commonly associated 
with the completion format — were included. 
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Scorina Reliability 

The results of the analysis of scoring agreement are 
presented in Table 5. Shown are generalizability coefficients 
for each item (ordered by category within form) , where each form 
of 10-11 items was scored by a different group of four raters. 
These coefficients reflect the level of reliability that would be 
^ obtained from a single reading with level differences among 

raters included as measurement error. Primary interest is on the 
relative differences among items as opposed to the absolute 
reliability levels. The median coefficients for Forms A-D were 
.87, .85, .67, and .73, respectively. Taking the bottom third of 

the coefficients in each form, some consistencies are evident. 
First, reordering/rearrangement and substitution/ correction items 
are among the ones with the lowest coefficients. In three of the 
four forms, completion items also appear in this group. Several 
subtypes appear several times each (e.g., word rearrangement, 
sentence combining, and sentence completion) , though the presence 
of single instances of several subtypes in the item set suggests 
that this consistency be cautiously judged. 



Insert Table 5 about here 



To assess the relationship between item category and scoring 
reliability, the product-moment correlation between category 
membership and generalizability was computed collapsing the items 
across forms. The relation was as predicted, though moderate at 
-.36 (t = -2.38, M = 40, E < .05). 
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Another view of the relationship between category and 
scoring reliability is presented in Table 6, which shows the 
median and range of generalizability coefficients by category, 
again with the coefficients collapsed across forms. The highest 
medians (.93 and .87) and narrowest ranges are associated with 
the multiple-choice and selection/ identification categories. 
Reordering/rearrangement and completion evidence the lowest 
medians (.56 and .67), and completion and substitution/correction 
the widest ranges. 



Insert Table 6 about here 



Some insight into the causes of disagreement for particular 
items can be gained from an informal look at the scores assigned 
by the raters. In several cases, the data suggest that low 
agreement levels could be attributed to a single rater (though 
not always the same single rater) . In some instances, a rater 
appeared to misunderstand the allowable range of scores, perhaps 
from having to repeatedly switch scales from item to item. On 
form B item #16 (sentence ordering) , one of the raters awarded 
scores to a quarter of the examinees that were beyond the range 
of the scale. On form C #46 (sentence combining) , one rater 
graded all papers on a 0-1 scale instead of the indicated 0-3 
scale. In this case, if read too quickly the scoring guide might 
be taken to imply that two alternative scoring schemes existed 
(because of the placement of a capitalized "OR”) . When the 
scores for these single raters are removed, the agreement 
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coefficients change from .53 to .76 for item #16 and for item #46 
from .30 to .61 (where the new coefficients are based on three 
rather than four raters) . 

Individual raters also appeared occasionally to diverge from 
the group because they applied the guidelines for what 
constituted a correct answer more or less strictly. On form D 
#40 (a word rearrangement task) , one of the raters consistently 
gave credit to a greater range of responses than the key allowed, 
presumably because the rater believed the added responses to be 
correct. Removing this rater's scores resulted in an increase in 
the generalizability coefficient from .52 to .77. For item #41 
(a sentence revision task) on the same form, an opposite 
situation occurred. Here, three of the four raters expanded on 
the key, but did so consistently among themselves. The fourth 
rater followed the key strictly, generally crediting only those 
sentences that exactly matched the ones listed in the guide. 
Removing this rater's scores increased agreement from .29 to .73. 

On other items, disagreement was more widely evident, 
largely because the key failed to provide enough guidance or its 
guidance was not completely correct. For form B item #6 
(sentence completion) , the key gave four examples and the 
direction to credit any "noun that makes semantic sense." This 
direction apparently left too much room for judgment, with some 
raters awarding credit for completions like "awareness" and 

"root" in the phrase "the of her conscience." For 

item #35 on form C (requiring punctuation of a passage), the key 
provided only a single example of a correct response when many 
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correct responses were possible, and when correct and incorrect 
responses could not be easily distinguished because of the 
complexity of the passage. Finally, in more than one instance 
the key listed a finite set of correct responses that turned out 
not to be exhaustive. On form C item #19, a word rearrangement 
task, the key indicated as acceptable a set of sentences about 
computer literacy. Many students, however, constructed sentences 
like the following: "Before the 1980s a major issue was in 

literacy not computer education." Some raters apparently 
believed such sentences, though perhaps awkward, to be correct, 
whereas others did not. 

Scoring disagreement was also evidenced on the multiple- 
choice items. Some disagreement is expected even for these 
putatively objective questions because of both the fallibility of 
humans and the experimental nature of the grading. In most 
cases, the exact cause of disagreement is difficult to infer as 
the examinee's response is extremely limited (e.g., a check mark 
next to an answer option) and because the reason for the rater's 
grading was not notated. Random errors might be caused by 
misreading the key, among other things. More consistent 
inaccuracies might be generated by grading without the key , 
perhaps using an incorrect memory of it or relying on incomplete 
content knowledge in place of it. Such systematic inaccuracy 
should be associated with particular raters. Two of the three 
multiple-choice items with the lowest generalizability 
coefficients showed this pattern; removing one of the raters 
from form C #18 raised the generalizability coefficient from .72 
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to .90/ an equivalent deletion for form A item #26 changed the 
coefficient from .89 to .95. 

In some instances, however, the causes of disagreement in 
scoring multiple-choice items could be more definitively 
inferred. This situation was true of form D item #21, which had 
a generalizability coefficient of .80. This item asked the 
examinee to identify the error in a sentence by marking the 
letter that corresponded to the underlined phrase containing the 
error. Several examinees chose two options — either by indicating 
two letters or by writing in corrections for two phrases. Credit 
for tnese responses was given by some raters if the correct 
option was included. 

In Table 7 the median generalizability coefficients are 
shown stepped up for scores produced by multiple gradings. For 
example, the mean of two ratings for selection/ identification 
produces a value equivalent to a. single-rating for the multiple- 
choice category, but four ratings are needed to achieve this 
level for construction or substitution/correction items. 



Insert Table 7 about here 



Discussion 

This paper presented an initial scheme for classifying 
constructed response items and the results of two analyses of it. 
The first analysis found substantial agreement in the assignments 
of items to the categories; over all items, the median number of 
judges disagreeing with the intended categorization of an item 
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was 2 (out of the 27 responding) . There was, however, more 
disagreement in some cases than might have been expected given 
this consensus; even one of the multiple-choice items was 
classified differently from its intended assignment by one-fourth 
of the judges. 

A possible explanation for many of the disagreements 
involves a confusion between two characteristics of an item — what 
the examinee is expected to accomplish, and how that is to be 
done. For example, multiple-choice item #13 requires the 
examinee to fill in a blank with the word that best fits the 
meaning of a sentence, and to do this by choosing one of five 
alternatives presented. An apparent focus on ”what''--f ill in the 
blank — rather than the intended "how" — by choosing among options 
— led a number of judges to classify this item as one of 
completion. 

It seems plausible to conjecture that many, if not most, 
such disagreements could be eliminated by providing more detailed 
instructions and examples to judges. A few ambiguities would 
remain and might require elaborating the definitions of some 
categories. For example, the description of the reordering/ 
rearrangement category might be extended to state explicitly that 
assignment to this category requires that the elements presented 
are to be rearranged with no modification whatsoever and with no 
addition of further elements, however minor; such an explanation 
might have forestalled the large number of judgments placing a 
construction shift item in this category. 




So far as differences among the categories are concerned, 
selection/identification stands out as yielding a higher 
proportion of items on which there was appreciable disagreement 
than any other categdry. All eight items in this category 
exceeded the median number of disagreements for the entire item 
set. However, this category was represented by only two item 
formats, not necessarily a representative sample of those that 
could be created. (Moreover, three of the six multiple-choice 
items, included in the analysis to provide a baseline against 
which to compare other formats, also produced more than the 
median number of disagreements.) It would be premature to 
conclude that any one category is inherently less clearly 
identifiable than the others. 

Turning to the study of agreement in scoring students ' 
responses, what is most salient is the high variation across and 
within categories. Differences across categories, such as those 
between multiple-choice and reordering/rearrangement, suggest 
that at least some meaningful distinctions can be identified 
through this classification scheme. The wide variation within 
such categories as completion and substitution/correction is 
somewhat artifactual, due in part to ambiguities in the keys for 
specific items or the actions of individual raters. Still, even 
accounting for these correctable factors some variation remains, 
suggesting the need to subdivide some item categories. 

As expected, the study detected a significant relationship 
between the openness of the response and scoring agreement. This 
relationship was moderate, with the most open responses — those to 
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the three construction items — yielding reasonably good agreement. 
Underlying this relation would appear to be the openness of the 
scoring key. In the case of the construction items, each had a 
very detailed guide noting the components that should be included 
as well as the number of points to be credited or debited for 
specific features of the writing. Some of the latter required 
judgment in scoring; for example, determining whether to deduct a 
point for "formatting the information in an inefficient or 
disorganized way." Such a requirement was evidently less a 
source of disagreement than the scoring of more structured items 
in which it was not possible to provide an exhaustive key and 
judges had to determine whether an answer was close enough to an 
ideal response to receive credit. 

Several judges participating in this study provided comments 
critical of the keys to some of the items they were asked to 
score. Many of these criticisms were well-founded — there were 
instances in which a purportedly exhaustive key was not 
exhaustive, as well as other errors and ambiguities in the 
formulae specified for deriving item scores. It is evident in 
retrospect that as much effort must be invested in reviewing and 
revising the keys for such items as is invested in the items 
themselves. It was not possible, given the limits on time test 
development staff could devote to an experimental investigation, 
to subject either the items or the keys to the exhaustive reviews 
typical for conventional items, much less to the Still more 
demanding reviews we would now expect to be required for items 
like those employed here. Because of this fact, and because of 
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the newness of the tasks required of those participating, it is 
reasonable to view these results as an underestimate of what 
might be obtained in operational use of these item types. 

Of some import in considering these results is whether 
scoring agreement is good enough— or could be made good enough 
with experience, a very thorough review process, and perhaps 
multiple ratings — to permit using these item formats in "high 
stakes” tests. Some preliminary judgments can be derived using 
as benchmarks the admittedly imperfect multiple-choice 
coefficients for an upper bound and those commonly found for 
essays as a lower one. Comparisons also need to consider the 
similarity of the item category with these benchmarks in terms of 
the openness and length of the response. For essay items, 
reasonable approximations might be the single-reader coefficients 
reported in Breland et al.'s comprehensive writing assessment 
investigation, which ran from the low .50s to mid .60s (Breland, 
Camp, Jones, Morris, & Rock, 1987). 

Using these standards, it would seem as if selection/ 
identification items come reasonably close to duplicating 
"objective” levels of agreement: the median and range for this 

category are very similar to those for multiple-choice. The 
construction items, which approximate the length and complexity 
of essay responses, compare favorably with Breland et al.'s 
values. The remaining category medians are noticeably lower than 
the multiple-choice value but at least as good or substantially 
better than the figures found for essay items. 
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The usability of these formats therefore appears to depend 
heavily on whether some degree of error is acceptable, as is the 
case in evaluating productions such as essays, or whether 
absolute accuracy is required, as has been typiv'al of conven- 
tional "high stakes” tests. This decision also needs to weigh 
the differential benefits gained from the categories (e.g., the 
categories differ in response complexity and, thus, have 
different implications for face validity, instructional 
diagnosis, and influencing teaching and learning). Finally, the 
fact should be considered that aggregations over even a small set 
of items imply a relatively small scoring error overall (but not 
the elimination of such error altogether) . 

The present study was an exploratory one, attempting to 
elicit information about the characteristics of a broad sampling 
of item formats. One appropriate direction for further work 
would be to select a limited number of these formats for more 
rigorous investigation. Items and keys would be developed with 
the same series of reviews and revisions employed for operational 
tests, data from a pretest sample would be scored, and items and 
keys would again be revised. A further data collection and 
scoring would then provide a more precise indication of the 
accuracy of scoring likely to be obtainable in practice. 

If the preliminary framework presented here seems to provide 
a useful organizing rubric, it would also be worthwhile to 
attempt to apply it to sets of items drawn from other content 
domains. The limitation to verbal items in the present study was 
deliberate, an attempt to avoid confounding differences in 
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formats with differences in content; but the scheme is intended 
to be more general, and its general izability merits examination. 

If the framework does successfully generalize to other 
contents, a next step might be to examine its empirical validity. 
Do correlations among items tend to be greater within than across 
categories? Does analysis of the cognitive demands of the 
various formats suggest greater similarities within categories? 

Finally, it might be useful to consider additional 
dimensions along which items could be categorized. The one- 
dimensional scheme presented here considers only the raw material 
from which the response is selected or constructed. Surface 
features of the demands of the task, as well as the degree to 
which an open item can yield a closed key, are among the 
additional dimensions that might be added to the scheme. 
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Table 1 



Variance Components for Classification into Six Categories of 36 

Items by 27 Judges 



Variance 

Comnonent 


Sum of 
Squares 


df 


Mean 

Sauare 


F 


P 


Variance 

Estimate 


Category 


2353.10 


5 


470.62 






2.90 


Judge 


45.46 


26 


1.75 


4.80 


. 001 


. 04 


Items within 














categories 


21.68 


30 


.72 


1.98 


.01 


. 01 


Category-by- j udge 


120.51 


130 


.93 


2.54 


. 001 


. 09 


Item-by-judge 














within cateoorv 


284.32 


780 


.37 






. 37 
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Table 2 

Frequency Distribution of Product-Moment Correlations Between Judges' 
Categorizations and the Intended Item Categorizations for 27 Judges 



Correlation 


Frequency 


.65 - 


.79 


2 


.80 - 


.84 


2 


.85 - 


.89 


3 


.90 - 


.94 


9 


.95 - 


1.00 


11 
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Table 3 

Frequency Distribution of Mean Differences Between Each of 2 7 Judges 
Categorizations and the Intended Item Categorizations 



Mean Signed 

Difference Frequency 



-.74 


to 


-.50 


0 


-.49 


to 


-.25 


0 


-.24 


to 


-.01 


4 


0 


to 


.24 


19 


.25 


to 


.49 


2 


.50 


to 


.74 


2 
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Table 4 

Number of Judges Whose Item Categorizations Diverged from 
the Intended Item Categorizations 



Item 

Number Cateaorv 


Item 

Descriotor 


# of 

Judges 

Diveraina 


36 


Multiple-choice 


sentence identification 


1 


18 


Multiple-choice 


error location 


2 


26 


Multiple-choice 


sentence identification 


2 


1 


Multiple-choice 


error location 


3 


21 


Multiple-choice 


error location 


3 


13 


Multiple-choice 


sentence completion 


7 


38 


Selection/Identif ication 


cloze elide 


3 


2 


Select ion/ Identification 


cloze elide 


4 


37 


Selection/Identif ication 


cloze elide 


4 


32 


Selection/Identif ication 


key list 


5 


4 


Selection/Identif ication 


key list 


7 


9 


Selection/Identif ication 


cloze elide 


7 


10 


Selection/Identif ication 


cloze elide 


7 


43 


Selection/Identif ication 


keylist 


8 


12 


Reordering/ rearrangement 


sentence ordering 


0 


19 


Reordering/ rearrangement 


word rearrangement 


0 


28 


Reordering/rearrangement 


word rearrangement 


0 


44 


Reordering/ rearrangement 


sentence ordering 


0 


40 


Reordering/ rearrangement 


word rearrangement 


1 


16 


Reordering/ rearrangement 


sentence ordering 


1 


30 


Reordering/rearrangement 


word ordering 


7 


24 


Reorder ing/rearrangement 


classification 


8 


25 


Substitution/correction 


word correction 


1 


45 


Substitution/correction 


word substitution 


2 


8 


Substitution/correction 


word substitution 


2 


29 


Substitution/correction 


word substitution 


2 


34 


Substitution/correction 


word substitution 


3 


31 


Substitution/correction 


sentence combining 


6 


46 


Substitution/correction 


sentence combining 


6 


5 


Substitution/correction 


sentence combining 


9 


41 


Substitution/correct ion 


construction shift 


17 


33 


Completion 


word cloze 


0 


39 


Completion 


word cloze 


0 


6 


Completion 


sentence completion 


0 


22 


Completion 


sentence completion 


1 


27 


Completion 


word insertion 


3 


17 


Completion 


paragraph completion 


4 


3 


Completion 


word insertion 


10 


35 


Completion 


punctuation 


12 


7 


Construction 


letter writing 


0 


11 


Construction 


essay 


0 


15 


Construction 


essay 


0 


20 


Construction 


announcement writing 


0 


23 


Construction 


essay 


0 


42 


Construction 


essay * 


0 


14 


Construction 


short exDlanation 


1 



4 
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Table 5 



General izability Coefficients 


for a Single Rater 


•s Item Scores 


Form A 


Item 




Item 


General izability 


Numbeir 


Cateaorv 


Descriotor 


Coefficient 


1 


Multiple-choice 


error location 


1.00 


26 


Multiple-choice 


sentence ident. 


. 89 


2 


Select ion/ identification 


cloze elide 


.92 


32 


Select ion/ identification 


keylist 


. 87 


5 


Substitution/correction 


sentence combining 


.83 


31 


Substitution/correction 


sentence combining 


.75* 


12 


Reorder ing/rearrangement 


sentence ordering 


.87 


28 


Reordering/rearrangement 


word rearrangement 


. 63 * 


27 


Completion 


word insertion 


.93 


7 


Construction 


letter writina 


.78* 



13 


Multiple-choice 


sentence completion 


.97 


36 


Multiple-choice 


sentence ident . 


.98 


4 


Select ion/ identification 


keylist 


. 85 


16 


Reordering/ rearrangement 


sentence ordering 


. 53* 


30 


Reordering/ rearrangement 


word ordering 


.49* 


8 


Substitut ion/ correct ion 


word substitution 


.98 


45 


Substitut ion/ correct ion 


word substitution 


. 95 


34 


Substitution/correction 


word substitution 


. 60* 


33 


Completion 


word cloze 


1.00 


6 


Completion 


sentence completion 


.23* 


14 


Construction 


short explanation 


.84 



18 


Multiple-choice 


error location 


.72 


43 


Select ion/ identification 


keylist 


.94 


9 


Select ion/ identification 


cloze elide 


.72 


19 


Reordering/rearrangement 


word rearrangement 


.34* 


25 


Substitution/correction 


word correction 


.74 


46 


Substitut ion/ correct ion 


sentence combining 


. 30* 


17 


Completion 


paragraph comp 


.71 


39 


Completion 


word cloze 


.63 


35 


Completion 


punctuation 


.41* 


20 


Construction 


announcement writina 


.57 



Form D 



21 


Multiple-choice 


error location 


. 80 


10 


Selection/ identification 


cloze elide 


.90 


38 


Select ion/ identification 


cloze elide 


. 86 


37 


Selection/ identification 


cloze elide 


.77 


44 


Reorder ing/rearrangement 


sentence ordering 


. 81 


24 


Reordering/rearrangement 


classification 


. 59* 


40 


Reordering/rearrangement 


word rearrangement 


. 52* 


29 


Substitut ion/ correct ion 


word substitution 


.71 


41 


Substitution/correction 


construction shift 


.29* 


3 


Completion 


word insertion 


.73 


22 


Comnletion 


sentence comnletion 


.42* 



★Agreement coefficient is in bottom third for the test form 
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Table 6 

Median and Range of Generalizability Coefficients for a Single Rater's 
Scores with Items Collapsed Across Test Fojrms within Categories 



Item Cateaorv 




Median 

Coefficient 


Ranae 


Multiple-choice (6) 




.93 


.72-1.00 


Select ion/ identification 


(8) 


.87 


.72- .94 


Reordering/rearrangement 


(8) 


.56 


.34- .87 


Substitution/correction 


(9) 


.lA 


.29- .98 


Completion (8) 




.67 


.23-1.00 


Construction (3) 




.78 


.57- .84 



Note. The number of items in a category is shown in parentheses. 
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Table 7 

Median Generaliza! ility Coefficients for Item Scores for Different 

Numbers of Raters 



Item Cateaory 




Number of 


Raters 




One 


Two 


Three 


Four 


Multiple-choice (6) 


.93 


.97 


.98 


• 98 


Selection/ identification (8) 


.87 


.93 


.95 


.96 


.Reordering/rearrangement (8) 


.56 


.73 


.79 


. 84 


Substitution/correction (9) 


.74 


.85 


.89 


.92 


Completion (8) 


.67 


.80 


.86 


.89 


Construction ( 3 ) 


.78 


.87 


.91 


.93 



Note. The number of items in a category is shown in parentheses. 




L/ 



Figure 1 

A Scheme for Categorizing Item Types 



0. Multiple-choice ; Items in this class require the examinee to 
choose an answer from a small set of response options. 

Example . Choose the word which, when inserted in the 
sentence, best fits the meaning of the sentence as a whole. 

Unable to focus on specific points, he could talk only about 

; indeed, his entire lecture was built around vague 

ideas. 

(A) personalities 

(B) statistics 

(C) vulgarities 

(D) particulars 

(E) abstractions 

1. Select ion/Ident if ication ; This category is characterized by 
choosing one or more responses from a stimulus array. In 
contrast to multiple-choice, the number of possible choices is 
typically large enough to limit drastically the chances of 
guessing the correct answer. In addition, in its ideal form, the 
response to this item type is probably mentally constructed and 
not simply recognized. Examples include keylists, cloze elide 
(i.e., deleting extraneous text from a paragraph), and, via touch 
screen, tracing orally presented directions on a computer 
generated map. 

Example . Delete the unnecessary or redundant words from the 
following paragraph: 

Andy Razaf is not a quickly recognizable name that is 
familiar to most people. Yet Razaf wrote the lyrics to at 
least 500 or more songs, including the words to the popular 
"Ain't Misbehaving'," "Honeysuckle Rose," and "Stompin' at 
the Savoy" as well. The American-born son of an upper class 
African nobleman, he still continues to be overshadowed by 
his composer-collaborators who worked with him. Fats Waller 
and Eubie Blake. 
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Figure 1 (con’t) 

A Scheme for Categorizing Item Types 



2. Reordering/ rearrangement ; Here, too, responses are chosen 
from a stimulus array. However, the task in this case is to 
place items in a correct sequence or alternative correct 
sequence. Examples include constructing anagrams, ordering a 
list of sentences to make them reflect a logical sequence, 
categorizing elements in a list, arranging a series of 
mathematical expressions to form a correct proof, arranging a 
series of pictures in sequence, and putting together a puzzle. 

Example . Rearrange the following group of words into a 
complete and meaningful sentence. Capitalize the first word 
and end with a period. No other marks of punctuation should 
be needed. 

a and be both can comedy enlightening entertaining good 

3. Substitution/correction : This item type requires the examinee 

to replace (as opposed to reorder or rearrange) what is presented 
with a correct alternative. Examples include correcting 
misspellings, correcting grammatical errors, substituting more 
appropriate words in a sentence, replacing several sentences with 
a single one that combines the meanings of each, correcting 
faulty computer programs, and substituting operators to create a 
true mathematical expression. 

Example . Combine the two sentences below into one 
grammatically correct sentence that conveys the same 
information as the original pair. 

1. Stephen King is the author of numerous horror novels. 

2 . Many fans of Stephen King assume that he is as crazy as 
some of his characters. 

4. Completion ; In this item type, the task is to respond 
correctly to an incomplete stimulus. Cloze, sentence completion, 
mathematical problems requiring a single numerical response, 
progressive matrices, and items that require adding a data point 
to a graph when given appropriate numerical data are examples. 

Example . Fill the blank in the following sentence with one 
word that makes the sentence grammatically and logically 
complete. 

Melodramas, present stark contrasts between 

good and evil, are popular forms of entertainment because 
they offer audiences a world where there is moral certainty. 



Figure 1 (con't) 

A Scheme for Categorizing Item Types 



5. Construction ; Whereas the Completion type requires that a 
stimulus be completed, here construction of a total unit is 
required. Examples are drawing a complete graph from given data, 
listing a country's exports, stating why condensation forms on 
windows, writing a geometric proof, producing an architectural 
drawing, and writing a computer program or essay. 

Example . Describe some event or phenomenon in the natural 
world (e.g. earthquakefe, thunderstorms, rainbows) that has 
always interested you and that you would like to know more 
about. What in particular would you like to know about this 
subject, and why? (You will have 1/2 hour in which to write 
this essay.) 

6. Presentation/Performance : This item type requires a physical 

presentation or performance delivered under real or simulated 
conditions in which the object of assessment is in some 
substantial part the manner of performance and not simply its 
result. Examples include repairing part of an automobile engine, 
playing an instrument, diagnosing a patient's illness, teaching a 
demonstration lesson, giving a theatrical audition. 

Example . Perform two contrasting solo pieces not to exceed 
two minutes each. Timing begins with an introduction in 
which you announce the audition in the following manner: 

"My name is (give name) . My first piece is from (title of 
play) by (author) . I play the part of (character) . My 
second piece is from (title of play) by (author) . I play 
the part of (character)." Props are limited to one stool, 
two chairs, and one table. To allow you to show your 
versatility, it is to your advantage to have the greatest 
possible contrast between your pieces. You will be judged 
on your ability to demonstrate control of material; 
flexibility of voice, movement, and expression; and vocal 
and physical articulation. 



Appendix A: Items Organized by Category 
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0. Multiple Choice 



1. The foUowins sentence macy conteln an error in one of the underlined portions. If so, indicate 

below the letter of the portion that contains the error. If the sentence is correct as written, nark 



Once Art Deco is called to your attention, one sees its influence everywhere . 
A B C 

in theater lobbies, in furniture design, even in perftae bottles. Wo error 

D E 

A. 

B. 

C. 

D. 

E. 



13. Choose the word which, when inserted in the sentence, best fits the sieaning of the sentence as a whole. 

Unable to focus on specific points, he could talk only about ; indeed, his entire lecture 

was built around vague ideas. 



(A> 


personalities 


(B) 


statistics 


(C) 


vulgarities 


(D) 


particulars 


(E) 


abstractions 



18. The following sentence may contain an error in one of the tmderlined portions. If so, circle the letter 
of the option that contains the error. If the sentence is correct 4s written, mark **E.** 



With the invention of the hyxxxlermic syringe and the adkainistration of pure 
A 

iDorphine in large numbers to wounded soldiers during the Civil War, 

B 

narcotics addiction became e serious social problem in the United States. 

C D 

Wo error 
E 

A. 

B. 

C. 

D. 

E. 



21. The following sentence may contain an error in one of the underlined portions. If so, cirle the 
letter of the option that contains the error. If the sentence is correct as written, mark 



As much as 200 Worth Aoierican Indian languages and dialects h ave ceesed 
~ B C 

to exist in that there are no surviving speakers or written records. 

D 

Wo error 
E 



A. 

B. 

C. 

D. 

E. 




26. Indie at ■ which of the followins sentences is sramnaticaLly correct end best expresses its aesnins* 

(A) Mess determines whether e st&r will compress itself Into e **white dwerf»** e **neutroo ster,** or e 
"black bole" after it passes through the "red sisnt" stage of its life cycle. 

<B) A starts compression of itself will be e "white dwarf," e "neutron etar," or e "black bole" after 
it passes through the "red giant" stage of its life cycle, depending on its mass. 

(C) After passing through the "red giant" stage of its life cycle, depending on e starve mass, a star 
will compress until there is e "white dwarf," a "neutron star," or e "black bole." 

(D) After passing through a "red giant" stage of e life cycle, e starve mass will detetmlne if the 
compression of itself is into e "white dwarf," e "neutron star," or e "black hole." 

(E) The mass of a star, after passing through a "red giant" stage of e life cycle, will determine 
whether or not to compress itself into e "white dwarf," e "neutron star," or e "black hole." 



36. Indicate which of the following sentences ia gramaaticalXy correct and best expresses its meaning. 

(A) Licbt did not realize he was being filmed, and when he was caught by the movie camera, he was 
eating e fish that still had its head on and was drinking red wine in greet gulps. 

(B) Licht did not realize he was being filmed, and when be was caught by the movie camera, he was 
eating e fish with its baad still on, drinking red wine in greet gulps. 

(C) Licht did not realize be was being filmed, and when he was caught by the movie camera, he had 
been eating a fish with its heed still on and was drinking red wine in greet gulps. 

(D) Licht did not realize he was being filmed, and when be was caught by the movie camera, he bed 
been drinking red wine in great gulps es he is eating e fish that still bed its heed on. 

(E; Licht did not realize be was being filmed, and when he was detigbt by the movie caiaere. he was 
drinking rod wine in great gulps and eating e fish with its bead still on. 



1 . Selection/ Identification 



2. The following passage contains irrelevant or incorrect words thet interfere with the meaning or 

produce graemuitical errors. Delete these words so thet the writing ie graaaatical and the aenaa of 
the passage Is not disrupted. 

Ludwig van Baathoven*s Ufa was not specific particularly rich in 
axtamal events: great occasions were rare puzzles, and ha never traveled to 

otherwise distant places. Be spent elmost all his Ufa in the cities of Bonn 
and Vlanna, working on his music. Unlike that of Hosart, who had aaam much 
of Europe during his concert tours while still a boy, Beethoven went oo very 
few Journeys after regrets moving to Vlanna ordinarily in Hovambar of 1792, 
at the age of 21. A concert tour ago to Prague and Berlin in 1796 and 
another to Prague in 1798 ware axcaptions; in general, ha vastly left Vlanna 

i^s iflBMdlata aurroundings only occasionally, and whan ha did It 
aftaxwarda was to spend e week or so es the guest of aristocretic petrons and 
friends. 

When Beethoven arrived in Vienne he was still despite e member of 
the Bonn court orchestre, as he bed been aver since the age of 14, but 
with the axtre collepee of the government et Bonn e few years later be 
wee left entirely to bis own davicas. Instead of being able while to 
enjoy the security of a court musician* a post, as bis father and 
grandfather bad than before him, be was forced to find ways to asm bis 
uncertainty Uving purely throu^ bis work as a composer, virtuoso 
plM)lat, and conductor. Tbaaa problaaw ware equal followed by another 
far more serious: grsdually increasing whan daafhaaa, which finally 

deprived him of help the ablUty to hear bla own music performed. To 
this aevara trial waa added the death of hie revoked brother Karl in 
X81S. Thereafter. Beethoven asatnuKl never guardlanahlp of Karl* a 
profUgata mvf dissolute son, a raaponalbiUty that cauaad Baatbovan 
agMiing much personal as well as financial ambarraaamant. The affect of 
all tbaaa tribulations can be seen clearly in HaX(teiUar*a acbolarabip 
portrait of the aging master. 




BEST COPY AVAILABLE 



4 



Tb« foUowlD^ passas* contalna undarllnad portlona that rapraaant poaaibly inappropriata word choica. 
Raad tha antlra paasaga. Then, for aach undarllnad word that rapraaanta an inappropriata word choica. 
think of a nora approprlata choica and look for It on tha list balow. Writa tha word froa tha list 
Juat abova tha undarllnad word in tha paaaasa. 

A braiPT nat'^rallst onca atatad that mpg tha aanr rtddlaa of natura. not 
tha laaat arcana la tha alsratlon of flahaa. Tha hoalna of aalaon la a 
particularly bold azampla. Tba Chinook aaliaon of tha U.S. Rortfawaat la bom 
in a T straam. miarates downriver to tha Pacific Ocean aa a youns molt 
and. aftar livins In tha sea for aa Ions aa flva yaara. awiaa back infallibly 
to the atrean of its birth to procreate . Its detamiiiation to return to its 
birthplace is mrthical . No one «dxo has seen a 100~pound Chinook f 1 ins itself 
into the air assin in a useless effort to overcome a waterfall can fail to 
marvel at tha strensth of the instinct that draws tha aalmon upriver to tha 
place idiere it was bora. 



MO CHANGE 


economic 


reliable 


REQUIRED 


ezperijaent 


sails 


appreciable 


fascinated 


spawn 


astronomical 


fictitious 


surmount 


baUttIa 


frugal 


theatrical 


biased 


heavily 


trwaodous 


centuries 


imsatuxa 


defeat 


conceited 


invisible 


undarstcxid 


condescending 


learned 


ixnarringly 


conjugate 


legendary 


unstintlnglt 


cybernetic 


luring 


vain 


deeply 


molt 


vane 


defeat 


monumental 


vault 


designated 


mysterious 


violent 


designed 


nesting 


whim 


despair 


perigrinates 


NO APPROPRIATE 


destination 


purely 


REPLACEMENT LISTED 


different 


rarely 




dramatic 


regimented 





9. Tha followins passasa contains irrelevant or Incorrect words that Interfere with aa a nin s produce 
sranBatical errors. Delate these words so that tha wrltlns i* srasaMitical and tha sense of tha 
passasa is not disrupted. 



Just such anoush is known about Phillis Wheatley's Ufa to susgoot tha able 
extent of her poetic talent, for she heard developed it ass tn st s^^aat odds. 
Tha tiJM and place of Wheatley's similar birth are aa unknown aa these of her 
African nmaa. but aha probably cama from whether what la now called Smasal 
or 6Md>la. Purchased directly off limits a alava ahip in Boston by a wealthy 
tailor, John Wheatley, in primarily 1781. aha was loains first teeth, and 
so aha was baliavad to ha rich about savm yaara of asa. She laamad Enslish 
in sixteen months, and soon more studied Latin aa wall aa tha Bible snd 
Enslish poetry by Alexander Pope and Thomas Gray. She hassn wxltins auddan 
ralisioua versa whan aha was than thlstaan. and aha could not have still bean 
more than savantaan yaara old whan aha publiahad her firat poMif amouncad an 
alasy on tha death of the Enslish avansalical preacher Gaorsa Whitehead. 



10. Tha followins pssaass contains irrelevant or incorrect words that interfere 
with tba msmips or produce graawuitlcal errors. Delate thaaa words hy 
crossing thma out so that tha writing ia gramatical and tha aansa of tha 
pasaaga ia not diaruptad. 

It* a worth tha drive trip to Medford to enjoy tha vallay's heat and finest 
Haxican~Amarican raataurant place, "Hixicmi Roaa.** Every day daily specials 
of fresh new cbarbroilad seafood, traditional diahaa* ataaks Mid riha* and 
wvMi also vagatarisn good meals are aarvad in an art daco atmoaphara. Try 
one of their exotic drink libationa at tha bar, or a pitcher of marguaritaa 
with dinner. “Mexican Roaa" waa voted tha beat top Maxicma raataumt in tha 
region area. 

ERIC 4 ; 



32. Th« word that best completes the sentence below sppeers in the elpbabetlcel word list thet follows the 
sentence. Put the number of this word in the blank space. 

The gravitational force of a T^lack hole** in space is strons that not even Ugbt can escape it: 

any ben that enters the field sats pulled into the so-called hole* where it mains trapped. 



1. 


actually 


11. 


distant 


21. 


never 


31. 


so 


2. 


afterwards 


12. 


enough 


22. 


nevertheless 


32. 


such 


3. 


also 


13. 


especially 


23. 


no 


33. 


that 


4. 


although 


14. 


extreeiely 


24. 


not 


34. 


therefore 


5. 


as 


15. 


force 


25. 


notably 


35. 


this 


6. 


awfully 


16. 


how 


26. 


otherwise 


36. 


. too 


7. 


because 


17. 


however 


27. 


overly 


37. 


unknown 


8 . 


conseq[uently 


18. 


like 


28. 


probably 


38. 


very 


9. 


despite 


19. 


more 


29. 


really 


39. 


whether 


10. 


discovered 


20. 


most 


30. 


since 


40. 


while 



37. Delete the unnecessary or redundant words from the followins paragraph: 

Andy Razaf is not a quickly recognizable name that is faaiiliar to most 
people. Yet Razaf wrote the lyrics to at least 500 or more sonss. includins 
the words to the popular ** Ain't Misbehavin',** **Honeysuckle Hose,** and 
**Stooipin' at the Savoy** as well. The American-bom son of an upper-class 
African nobleman, he still continues to be overshadowed by his composer- 
collaborators who worked with him, Fets Waller and Eubie Blake. 



38 The followins passage contains irrelevant or incorrect words that interfere with meaning or produce 
granmatical errors. Delete these words so that the writing is graomaticai and the sense of the 
passage is not disrupted. 

**Dickens," George Orwell once remarked, **is one of those writers 
well worth imitating.** Consequently, many different fraction groups 
were eager to claim him as were one of their own comatose. Did Orwell 
foresee as that someday he too would become just nicely such ss a 
writer? Almost certainly incomplete he did not. In 1939, when he wrote 
dogged those words about Dickens, Orwell was still s true relatively 
obscure figure and among dishes those who knew his work at all wrongs, a 
highly controversial finally-one. Only a year earlier, than his work 
had been extent rejected on political grounds flag by his own publishers 
in both Britain and the United States tomorrow. Hevertheless and, by 
the time hearing of his death in 1950 at the age slightly of forty-six, 
he had become old so famous today that his very name entered regret the 
ixiguage and has remained tight there in the form of the adjective 
**Orwellian*’ birds. 



43. The word that best completes the sentence below appears in the alphabetical word list that follows thi 
sentence. Put the mmiber of this word in the blank space. 



Even when they are isolated from sunlight, plants 
night. 



1. 


actually 


11. 


distant 


2. 


afterwards 


12. 


enough 


3. 


also 


13. 


especially 


4. 


although 


14. 


extraswly 


5. 


as 


15. 


force 


6. 


awfully 


16. 


how 


7. 


because 


17. 


however 


8. 


consequently 


18. 


like 


9. 


despite 


19. 


more 


10. 


discovered 


20. 


B»St 



re stiU able to teU it i* 



21. 


never 


31. 


so 


22. 


nevertheless 


32. 


such 


23. 


no 


33. 


that 


24. 


not 


34. 


therefore 


25. 


notably 


35. 


this 


26. 


otherwise 


36. 


too 


27. 


overly 


37 


unknown 


28. 


probably 


38. 


very 


29. 


really 


39. 


whether 


30. 


since 


40. 


while 




BEST COPY AVAILABLE 



2. Reordering /Kcarranne<D«nt 



12. The four sentences in the foUowins pgrasreph ere out of order. Losicelly reorder the« by Indicetins 
in parentheses what number each sentence should have been in the revised parasrepb. 

( } However, if the star wee orisinaUy wore massive, equal to three or four of our Sims, it 
coBpreseee further and chanses from e **white dwarr* into e ''neutron etar." C ) At the end of its 
life cycle, e star begins to compress after it has burned up all of its hydrogen and helltM. C ) And 
if the original star was still sore nassive, the neutron etar continues coaipreesing until it crushes 
itself into that saost mysterious of all forms in outer space, e "black hole." < ) If the star was 

originally less massive than about two of our Suns, it compresses until it becomes e "white dwarf*" 



16.- The four sentences in the following paragraph by Alfred Hitchcock are out of order. Logically reorder 
them by indicating in parentheses whet nunber each sentence should have in the revised paragraph. 

C ) Unfortunately, few of the books seemed to have much connection with whet one earn at the 
local movie theater. ( ) Kobody wrote for the sensible middlebrow moviegoer who was keetily 
interested in the craft of the cinema without wanting to make e religion of it. ( ) Thirty or forty 
years ago, when the idee of the cinema as an art form was new, people etarted to write highbrow 
treatises about it. ( ) Even earlier began the still-continuing deluge of fan magexines and annuals, 

full of exotic photographs but short on solid information. 



19. Rearrange the following group of words into e complete and meaningful sentence. Cepitalixe the first 
word and end with e period. Ho other marks of punctuation should be needed. 

e the in not was 1980s issue literacy major before education computer 



24. The following is an alphabetical list of subjects people study at 

universitiee . Re-order and classify these subjects into four or five 
categories that represent major fields or disciplines. Label your 
categories and give e brief explanation of your system of clessificetion. 



Accounting 

Anetoaor 

Anthropology 

Archaeology 

Architecture 

Biology 

Business 

Chemistry 

Chemi.cel Engineering 
Computer Science 
Dance 

Drama 

Earth Science 

Econoadcs 

Education 

English 

Finance 

Fine Arts 



Foreign Languages 

Forestry 

History 

Law 

Linguistics 
Marine Biology 
Mathematics 

Mechanical Engineering 

Music 

Heurology 

Riilosopfay 

Physics 

Political Science 
Psychology 
Sociology 
Urban Studies 
Women* e Studies 
Zoology 



28. Rearrange the following group of words into e complete sentence. Cepitalixe the first word and and 
with e period. Ho other marks of punctuation should be needed. 

e «ul be both can comedy enlightening entertaining good 




30, Make as many grammatically correct English sentences as you can xising only words from the following 
list. A sentence may use any ntjmber of words frori the list, but a word can appear only once in any 
sentence. 



an 

extremely 

fish 

have 

of 

sense 

sensitive 

smell 



40. Rearrange the following group of words into a complete sentence. Capitalize the first word and end 
with a period. You may add punctuation if you feel it is needed. 

fewer age to people as tend they colds get 



44 ^ The five sentences in the following paragraph are out of order. I«ogically reorder them by indicating 
in the parentheses what nisaber each sentence should have in the revised paragraph. 



( ) With the 1986 Tax Reform Act, however, the game plan has changed. ( ) In either case, t^ years 

e person has already worked for his or her present employer count. ( ) How does one get vested ^ a 

company pension plan these days? ( ) Scheduled to take effect this year, the new rules reduce the 

vesting period to five years — to partial vesting after three years iwith full vesting *^^er 
( ) Until this year most workers had to be employed by e company for ten years before they became 

vested — that is. entitled to received a pension at retirement. 



3. Substitution/Correction 



5. Combine the two sentences in (A) by writing a phrase in the blank in (B) that makes (B) e single 

gr 2 amatical sentence. This sentence should contain the saow information and have the same meaning as 
the pair in (A). 

(A) Th« discovorr of "black holes" is amon« the most oxcitin* recent developments in astronomy. It 
cmne well after the discovery of '*red giant*’ stars. 

(B> The discovery of **hlack holes.** .. I® among the most exciting 

recent developments in astronomy. 



Correct the following sentence by crossing out the one word that produces a grammatical error end 
substituting the appropriate word. 

Many fans of Stephen King, the author of nisaerous popular horror novels, 
asstsae that he is so mad as soom of his characters. 



25. Cross out ths words in tbs psssss* bslow thst srs silsspsllsd. Writs tbs word corrsctly in ths spscs 
st ths right of ths linss. If thsrs srs no AisspslXings in ths Lins, writs nothing oo ths Uns. 



SoiBstiiBss pruning is csXXsd s scsincs . 

Soastltsss pruning is callsd an srt. Ths 

dsfsnition dspsnds on ths purposs. For ths 

swsrags gsrdnsr, pruning is s oissns of kssping 

plants undsr control to filX thsir sUotsd 

spscss. Whsn ths plans outgrow thsir spscss, 

thsy naxst bs dissplinsd. 

Either approach ra(piirss sons knowladga. 

Msraly hacking with a saw and pruning shsars 

is not halpfulX to tha plant’s fora or — 

vigor. This is aspacialXy trua whan aaiatura 

prtmars shapa plants froa tha top only and 

fail to gat undamaath and cut out older 

growth. From Frfjuary until genuine spring whan 

tha buds begin to break, plants are doraant 

and can be pruned. This is tha tiaa to do sons 

serious hoaawork and look at sona of tha 

source books on pruning. 

29. Correct tha following sentence by crossing out the one word that produces a graasutical error and 
substituting tha apj^ropriata word. 

Tha sixtaanth-cantury art critic Vasari regarded tha painting entitled tha Mona Lisa is a wondarfuUy 
faithful reproduction of an actual person; to many ninotaenth-cantury critics, it was a syobol to be 
decoded. 

31. Coi^ina tha two ssntsncaa below into one grasaatically correct ssntanca that convsya tha mtmm 
inforwation as the original pair. 

1. Tha fires sat to funigata tha housaa of tha victias of tha Black Death daatroysd aany 
do Clien ts . 

2. These could have identified tha victlaa and their ancaatora. 

Replace each underlined word or phrase in tha pasaaga below with a different word or phrase that 
changaa tha asaning of tha original as Uttla as possible. 

Soas faculty ambers took as and Rovalla out to lunch in San dose’s finest eatarr ““ nerves auch 

by th«ir kindness snd sevsrsl drinks. In aidlunch two am caM onrsr to our tablo, m dean and 



a snnica youns Callow looklns sooathin* Ilka a cowposlta of tha Junior Hatarsato fa^ wa'd sa«l on 
talavlslon, who Introduced biasalC as a lawyer tor tha university truataas. I said, Cb tood, I need a 
lawyer: I just got this absurd nota about a loyalty oath and Clngarprinting. Thara's not a word 



about either in By contract. 



o 



ERIC 



BEST COPY AVAILABLE 




“Th« rocky outcrops of Horth A»«ric« srs still roao«d by ths bobcat, thoucb it is saldca saan or 
heard . ** 

Rewrite the eetntence ebcve so that it convey* the saene aea n i n it a* the original. S TART your new 
sentence with "The bobcat." 



Correct the following sentence by crossing out the one word that produces a graaaaatical error and 
sxibstituting the appropriate word. 

Tha roads and maans of transportation remain as thay did thirty yaars a«o; only tha town hall with its 



television aerial is new. 



Conhine the two sentences below into one granmatically correct sentence 
that conveys the svie information as the original pair. 

1. Stei^en ICing is the author of msnerous horror novels. 

2. Many fans of Stephen Kin* assume that ha is as crary as some of his characters 



A . Completion 

Insert words into the sentence below that will make the statement logically and grammatically 
complete. 

Birds, bees, and various migratory specias can tall direction they are 
traveling; for example, a migrating flock can use the positions of the Sun or 
stars find north. 



Fill in the blank in the following sentence with a word that makes the sentence graoiaatically and 
logically coopleta. 



Anti~aparthaid writer Janet Levine attributes the 

mentors, not the least of thesa baing a Black family suiid who spoka 
South Africa. 



of bar conscienca to sevaral 
bittarly of tha injusticas in 



Dndemaath tha paragraph below, writ, a aantenc. th^ could ^«.^Lr 

adssing from tha paragraph. Base your sentence on what has preceded and what follow* the space lor 

the sentence. 



Arch.m>logi.t. b.Uav. that thay havs found tha sit. of tha Roa. Th.at.r, a aixt.m^- 

^tury, opw-air playhouse where work, by Shakaapaara, Marlowa, Jonaro 

nerfotmed Since Dacaadoar, a team of twelve archaeologist* hss studied the site, 
SitriS »;:^L*l^t:rr30-;.--old offlc: bundin. w- r..ml to -.k. w.y for a nm, atructura. 



But in recent weeks scholars, theater buffs and actors 
hav. protaatmi plana to a 'nd the dig, writtan latter, to tha nawsp.para, and attampted to n.goti.t. 
with tha property owners. 



Fill in tha bl«* in tha following amitanc. with a word that Mka. th. amitmc. gra—aticaUy «>d 
logically complete. 




nearly to aztinctioo. 



27 . 



In»«rt th« word into the sentenco balow that will malkm thm stataoMot logically and 
corract. 



sr^Mtically 



Tha hunan mind daligbts finding patterns— so auch so that wa often aistaka coincidanca for nrofound 
meaning. 



33. Fill tha blank in tha following sentence with one word that makes tha sentence grannatically 
logically coeipleta. 

Melodramas, present stark contrasts between good and evil, are popular forms of 

entertainment because they offer audiences a world where there is moral certainty. 

3S. Insert whatev’er punctuation ie needed to make the eentence given below 
clear and graanatically correct. 

This entire allegory I said you may now append dear Glaucon to the previous 
arguBMnt the prison-house is the world of sight the light of the fire ie the 
sun and you will not misapprehend me if you interpret the journey upwards to 
be the ascent of the soul into the intellectual world according to siy poor 
belief which at your desire I have expressed rightly or wrongly God knows 

39. Fill in the blank in the following sentence with one word that makee the sentence grmoaatically 
logically complete. 

Kate Millett’s Sexual Politics (1970) has been regarded as one of the most Important texts of the 
modem feminist movement, its author is renowned as one of the movement's founders. 



5. Construction 



7. 



Golden Hews April 20. 1977 

S1M1ER EMFLOTMOrr 
EARR ft LEASR 

Positions opening soon for apprentices in 
Medical Services 
Food Servlcea 
Library Servicea 

Earn $3.50 or more per hour while you 
learn a valuable skill. 

Send letter of application to: 

Tyland Training Center 
Box 33S 

Tyland. CA 99499 




Pretend that you are Pat Caraon and live at 291 Maatovar Street In Tyland, California. Write a letter 
applying for the work-training program in one of the catagoriea liated in the advertiaanant. You may 
either give facta about youraelf or make up information that you think will help you be aaeaptad. 




(Yoa itiU hcva 45 misufm in 



11. Dlractions: Plaasa writ* an assay on ORE of tha foUowins topics, 

to writa this assay.) 

1. **Ours is an asa of indiffaranca — a tijaa ahan paopla shear littla intarast in social and political 
issuas . ** 

Do you a«raa or disasraa with this statanant? In your assay* proalda axa^plas to support your slaw. 

2. **Evarythin 4 in Ufa chansas.** 

Identify ona rAi-ng in society that has chansad signilicantly in this century. Explain how this chans* 
has affected our lives. Ba specific. 



14. You are praparins a report on andansarad species of animals. Write ona or two sentences in which you 
present as aiuch of tha information provided by tha foUowins S^aph as possible. 

Gorilla Sightinss in tha Virunsa Mountains 
of Rwanda. Zaira* and UgeQda 



300 

280 

260 

240 * 

1981 

Census published in 1986 



* 



1986 



15. Describe soma event or phenomenon in tha natural world (a.g. earthquakes* thunderstorms, rainbows) 

that has always interested you and that you would like to know more about. What in particular would 
you Ilka to know about this subject, and why? (You will have a 1/2 hour in which to write this 
assay.) 



BEST COPY AVAILABLE 
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20. You ar« solos to read a transcript of a talaphona conversation between two people. After you have 
read the conversation, write the anoounceoMOt that you think Pat Carson should put on the bulletin 
board. 





Conversation Transcript 


Mrs . Stone : 


Hello. Pat. This is Vera Stone. 


Pat Carson: 


I thousht you were away. 


Mrs . Stone : 


Not until tomorrow. But did you read the newspaper this momins? About the Youth Center? 


Pat Carson: 


No, what happened? 


Mrs . Stone : 


The wind storm did a lot of damage to the roof and grounds. The Youth Center staff will need a 
a lot of help to get it back in shape. 


Pat Carson: 


1*11 be glad to help. 


Mrs. Stone: 


Great, but we'll need a crew of workers. See if you can get about 20 volunteers. Could you 
put up an announcement outside the principal's office? 


Pat Carson: 


Sure, 1*11 be glad to. 


Mrs. Stone: 


I*d like to meet on Saturday akoming, but I think a lot of the kids have band practice, so 
let's meet at 1:00. 


Pat Carson: 


That's been cancelled. Why not have them come at 8:30? 


Mrs. Stone: 


Pine. They should bring tools. 


Pat Carson: 


Like what? 


Mrs. Stone: 


Hammers, rakes, shovels ~~ wheelbarrows if they can. They shouldn't bring any power tools, 
though. That's all we need, an accident with a power tool. They can work til noon and I'll 
provide lunch for everybody. 


Pat Carson: 


Great. Then they'll be sure to come. Oh, by the way, do you mean this Saturday or the next? 


Mrs . Stone : 


This one, March 21st. 


Pat Carson: 


Sure, Mrs. Stone. I'll be glad to put up an announcement. 


Mrs . Stone : 


Thanks, Pat. I appreciate your help. 



23. Describe your favorite book, poem, film, or piece of music, eicplainins what features of the work you 
find iDost successful or appealins and what, if anythins* could be done to improve it. (You will have 
a 1/2 hour in which to write this essay.) 

n 



42. Which of your possessions would be the most difficult for you to sl>ve up or lose? Discuss why. (You 
will have 30 minutes in which to write this essay. ) 



Appendix B: 



Scoring Guide Organized by Item Category 



0 b 



o 

ERIC 



0 . Hultiple Choio 



Unless otherwise specified, items should be scored as ’*1" (correct) or "0" (incorrect), 

1 . Answer = B 

13. Answer « E - Abstractions 
18. Answer = B 



21. Answer *= A 



26. Answer = A 



36. Answer - E 



1. Selection/ Identification 



Unless otherwise specified, items should be scored as '‘1*’ (correct) or "0“ (incorrect). 



2. See the attached template for an aid in scorins this item. 



Irrelevant words: 



specific 

puzzles 

otherwise 

that of 

regrets 

ordinarily 

ago 



vastly 

afterwards 

despite 

extra 

while 

then 

uncertainty 



equal 

when 

help 

revoked 

never 

ending 

scholarship 



Responses are scored on a 0-7 scale by subtracting the number of erroneous responses from 21, dividing 
this figure by three, and rounding to the nearest whole number. (Award a 0 if the result is negative.) 
An erroneous response is either the failure to delete an irrelevant word or phrase (e.g., “that of"), or 
the deletion of a word or phrase that belongs in the passage. 



Example : 4 failures to delete 

3 inappropriate deletions 
7 total errors 

21 

^7 

14/3 ** 4^^3 *= 5 (total score) 



4. See Scoring template. 



Key: 




1. learned, fascinated 


7. unerringly 


2 . No change 


8 . spawn 


3. mysterious 


9. legendary 


4 . No change 


10. vain, tremendous 


5. dramatic, legendary 


11. surmount 


6. No change 


12. No change 



Treat the words NO CHANGE REQUIRED as equivalent to the absence of an insertion over an underlined word. 
Score as incorrect the use of NO APPROPRIATE REPLACEMENT when no change is required. 

Responses are scored on a 0-4 scale by dividing the number of correct answers by 3 and rounding to the 
nearest whole number. (Award a 0 if the result is negative.) 
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9. Irrelevant words: 



such 


primarily 


able 


rich 


heard 


more 


similar 


sudden 


these 


then 


of 


still 


whether 


announced 


limits 





Responses are scored on a 6-point scale (including 0) by subtracting the number of erroneous 
responses from 15, dividing this figure by 3, and rounding to the nearest whole number. (Award 
a zero if the result is negative.) An erroneous response is either the failure to delete an 
irrelevant word or phrase, or the deletion of a word or phrase that belongs in the passage. 

Example ; 3 failures to delete 

1 inappropriate deletion 
A total errors 

15 

U/3 * 3^/3 = A 



10. Key: 

Line 

1. Drop "drive" or "trip"; "best and" or "and finest" 

2. Drop "place"; "Every day" or "daily" 

3. Drop "new" 

A. Drop "even" or "also"; "good" 

5. Drop "drink" 

6. Drop "best" or "top" 

7. Drop "region" or "area" 

Responses are scored on a 0-3 scale by subtracting the number of erroneous responses from 10, dividing 
this figure by three, and rounding to the nearest whole number. (Award a 0 if the resulting score is 
negative.) An erroneous response is either the failure to delete an irrelevant word or phrase (e.g., 
"best and"), or the deletion of a word or phrase that belongs in the passage. 

Example : 3 failures to delete 

3 inappropriate deletions 
6 total errors 

10 

26 

A/3 = 1^^3 “ 1 (total score) 

32. Key "so" 



37. Key: 



Line 

1. Drop "quickly recognizable" 

2. Drop either "at least" or "or more" 

3. Drop either "the words to" or "words to the" 
A, Drop "as well" and "upper class" 

5. Drop "still" 

6. Drop "who worked with him" 



Responses are scored on a 0-2 scale by subtracting the number of erroneous responses from 7, dividing 
this figure by three, and rounding to the nearest whole number. (Award a 0 if the resulting score is 
negative.) An erroneous response is either the failure to delete an irrelevant word or phrase (e.g., 
"who worked with him"), or the deletion of a word or phrase that belongs in the passage. 

Example ; 7 
-6 

1/3 « 0 (total score) 




38. See Scoring template. 



Key : 

Line 

2. fraction 

3 . were; comatose 
A. as; nicely; as 

5 . incomplete 

6. dogged; true 
1 . dishes; wrongs 

8. finally; than 

9. extent; flag 

10. tomorrow; and 

11. hearing; slightly 

12. old; today; regret 

13. tight 
lA . birds 



Responses are scored on a 0-8 scale by subtracting the number of erroneous responses from 23 dividing 
this figure by three, and rounding to the nearest whole number. (Award a 0 if the resulting score is 
negative. ) An erroneous response is either the failure to delete an irrelevant word or phrase or the 
deletion of a word or phrase that belongs in the passage. 



Example ■■ 23 
-6 

17/3 = 6 (total score) 



A3. Key: "whether’ 



2. Reorderinn/Rearrangement 



Unless otherwise specified, items should be scored as ”1” (correct) or "O'* (incorrect). 



12. Key: (3), (1), (A), (2), where (3) indicates that the first sentence belongs in the third position. 

Score on a 0-A scale by awarding 1 point for each correct placement of a sentence. 



Example : 


(A), (1), (3). 


(2) 




1 


1 




point + 


point 



For imperfect responses only , in addition to awarding points for absolute placement, grant 1/2 point for 
each correct sequence of two sentences. For example, the sequence Cl), (A), (2), (3) would receive 1 
point for sequence: 1/2 point for A and 2, and 1/2 point for 2 and 3. Round all scores up to the 
nearest whole number. 



16. Key: (2), (A), (1), (3). where (2) 
Score on a 0-A scale by awarding 1 



indicates that the first sentence belongs in the second position, 
point for each correct placement of a sentence. 



Example : (3), (A), (1), (2) 

1 + 1 
point point 



■ 2 points 



For imperfect responses only , in addition to awarding points for absolute placement, grant 1/2 point 
for each correct sequence of two sentences. For example, the sequence (A), (1), (3), (2) would receive 
1 point for sequence: 1/2 point for A and 1, and 1/2 point for 1 and 3. Round all scores up to the 

nearest whole number. 



19. Key: 

* Before the 1980s computer literacy was not a major issue in education. 

* Computer literacy was not a major issue in education before the 1980s. 

* Computer literacy was not before the 1980s a major issue in education. 

* Computer literacy was not in education before the 1980s a major issue. 

* In education before the 1980s computer literacy was not a major issue. 

* In education computer literacy was not a major issue before the 1980s. 

* Not before the 1980s was a major issue in education computer literacy. 

* Not before the 1980s was computer literacy a major issue in education. 
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24. Cat«sorization Task: 



Ratars omst datarroina whathar tha four or mora catasorias ara losical and whathar classification 
into thasa catasorias is consistant. 

Scoraa ara awardad on a 0”9 scala by giving a point cradit for aach logical classification, 
assassins a point panalty for aach illogical or missing classification, dividing tha total by 4, and 
rounding to tha naarast whola numbar. Award a 0 scora if tha rasult is nagativa or if tha 
categorization schama is illogical on tha whola. 



28. Key: 

A good comedy can ba both antertaining and anlightaning . 
A good comedy can ba both anlightaning and entertaining. 



30. Acceptable responses must make reasonable sense and be appropriately capitalized and punctuated. 

Key: 

Acceptable Responses: 

Fish. 

Fish have an extremely sensitive sense of smell. 

Fish have sense. 

Fish smell. 

Fish smell extremely. 

Have fish. 

Have sense. 

Smell . 

Smell fish. 

Unacceptable Responses: 

Fish extremely. 

Fish have smell. 

Fish sense an extremely sensitive smell. 

Fish smell sensitive. 

Have fish sensitive smell. 

Sense extremely. 

Sense fish. 

Sensitive fish have smell 
Sensitive small have fish. 

Smell extremely. 

Score on a 0-4 scale with 1 point for 1-2 acceptable responses, 2 points for 3-4 acceptable responses, 3 
points for 5-6 acceptable responses, and 4 points for 7 or mora acceptable responses. Deduct 1 point 
for 1-2 unacceptable responses, 2 points for 3-4 unacceptable responses, etc. If resulting score is 
less than 0, award a 0. 



40. Key: 

'*• As they age, people tend to get fewer colds, (comma optional) 

* People as they age tend to get fewer colds. 

* People tend as they age to get fewer colds . 

* People tend to get fewer colds as they age. 



44. Key: (3), (5). (1), (4), (2), where (3) indicates that tha first sentence belongs in the third position 
in tha paragraph. 

Alternate Kay: (2), (5), (3), (4), (1) 

Scora on a 0-5 scala by awarding 1 point for each correct placement of a sentence. For imperfect 
responses only , in addition to awarding points for absolute placement, grant 1/2 point for aach correct 
sequence of two santancas. For example, tha sequence (5), (1), (4), (2), (3), would receive 1.5 points 
(rounded to 2) for sequence: 1/2 point for 5 and 1, 1/2 for 1 and 4, and 1/2 for 4 and 2. Round all 

scores upward to tha naarast whola numbar. 





3, Substitution/Correction 



Unless otherwise specified, items should be scored as ”1" (correct) or "O'* (incorrect). 



5. Correct Solutions: 

a) coming well after the discovery of ’’red giant " stars 

b) coming well after that of ’’red giant" stars 

c) occurring well after the discovery of ’’red giant* stars 

d) occurring well after that of "red giant stars 

e) which came well after the discovery of "red giant" stars 

f) which came well after that of "red giant" stars 

g) which occurred well after the discovery of *'red giant" stars 

h) which occurred well after that of red giant stars 

i) made well after that of "red giant" stars 

j) made well after the discovery of "red giant" stars 

k) which was made well after the discovery of "red giant" stars 

l) which was made well after that of ‘"red giant stars 

m) coming well after "*red giant" stars were discovered 



8. Key: Change "so" to "as" in line 2. 

Many fans of Stephen King, the author of numerous popular horror novels, 
assume that he is as mad as some of his characters. 



25. Key: 

LINE 

1. science 
3. definition 
i* . gardener 

6. plants 

7. disciplined 

10. helpful 

11. amateur 
lA . February 



Score on a 0-A scale awarding 1/2 point for each corrected misspelling and subtracting 1/2 point for 
each originally correct spelling that is misspelled. Round up to the nearest integer. Award a 0 if the 
result is negative. 



29. Key: Change "is" to "as" 

The sixteenth-century art critic Vasari regarded the painting entitled the Mona Lisa «s a wonderfully 
faithful reproduction of an actual person; to many nineteenth-century critics, it was a symbol to be 
decoded . 



31. MANY CORRECT RESPONSES ARE POSSIBLE. 
Responses should be scored as follows: 



3: The response is a gramnatical sentence that contains all of the original information. 

Example: "The fires set to fumigate the houses of the victims of the Black Death destroyed many 

documents that could have identified those victims and their ancestors. 

2* The response is a grammatical sentence that omits some of the original information. 

Example: "Many of the victims and their ancestors could have been identified by documents that were 

destroyed by fires set to fumigate the houses." 

OR 

The response is a sentence with some graimatical or syntactical problem(s) that contains all of the 
original information. 
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Example ; "Destroyed by fires set to fumigate the houses, many victims of the Black Death and their 
ancestors could have been identified by the documents/' 

X: The response is a sentence with some grammatical or syntactical problero(s) that omits some of the 
original information. 

Example ; '*The victims and their ancestors could be identified by the documents, but fires set to 
fumigate the houses destroyed them. 

0; The response is not a single sentence, or it is one marked by serious grammatical errors, 
incoherencies, and omissions of essential information. 

Example : "To get fumigate from the Black Death many houses were burned and it destroyed many 

documents . " 



34. THERE ARE MULTIPLE CORRECT POSSIBILITIES. 

Score on a 0*4 scale awarding 1/2 point for each acceptable substitution of a synonym for an underlined 
word or phrase. Round up to the nearest integer. 

Examples ; eatery ... restaurant 
assuaRed . . . soothed 
kindness . . . hospitality 



41. Key: 

* The bobcat still roams the rocky outcrops of North America, though it is seldom seen or heard. 

* The bobcat still, though it is seldom seen or heard, roams the rocky outcrops of North America. 
^ The bobcat, though it is seldom seem or heard, still roams the rocky outcrops of North America. 



45. Key: Change "did" to "were" 

The roads and means of transportation remain as they were thirty years ago; only the town hall with its 
television aerial is new. 



46. Key: There are more than a dozen legitimate ways to do this. 

Responses are scored as follows: 

3: The response is a grammatical sentence that contains all of the original Information. 

Example : "Many fans of Stephen King, the author of numerous horror novels, assume that he is as crazy 
as some of his characters," 

2: The response is a grammatical sentence that omits some of the original information. 

Example "Many fans of his numerous horror novels assume that Stephen King is also crazy. 

OR 

The response is a sentence with some graninatical or syntactical problem(s) that contains all of the 
original information. 

Example : "Stephen King is the author of numerous horror novels and is assumed by many of his fans 
that he is as crazy as some of his characters." 

1: The response is a sentence with some grammatical or syntactical problem(s) that omits some of the 
original information. 

Example : "Many fans assume that Stephen King, who is the author of numerous horror novels, and is 
also somewhat crazy." 

0: The response is not a single sentence, or it is one marked by serious grammatical errors, 
incoherencies, and omissions of essential information. 

Example : "Stephen King as author of horror novels, and crazy. 







4. Cocpletion 



Unless otherwise specified, items should be scored as *'l" (correct) or 0 (incorrect). 

3. Score as 1 or 0. 

Keys for lines 1 & 2: 

what (direction) 
the (direction) 
the (direction) in which 
the (direction) that 
in what (direction) 
which (direction) 
in which (direction) 

(direction) as 

the (direction they are traveling) in 
what (direction they are traveling) in 
which (direction they are traveling) in 

Key for line 3: 

to (find) 



6. Multiple Keys are possible: 

"development," "evolution," 

"awakening, "growth," 

Or any noun that makes semantic sense. 



17. Key: Many different completions are possible. A credited answer should convey the idea that property 
owners want to halt the dig, or begin construction of the new structure. 



22. Key: Any participial form that makes semantic sense. 

Responses are scored as follows: 

2: a participial form that makes semantic sense 
Example: "attempting" 

1: a participial form that does not make proper semantic sense. 
Example ; "remembering" 

OR 

a non-participial form that makes semantic sense 
Example ; "eager" 

0: a non-participial form that does not make proper semantic sense 



27. Key: "in" after "delights" 

The human mind delights in finding patterns -- so much so that we often mistake coincidence for profound 
meaning . 



33. Key: "which" 



35. Key: A correct answer can be a single sentence or multiple sentences as long as words are not modified, added, 

or deleted. 

There are multiple correct possibilities in addition to the following: 

This entire allegory, I said, you may now append, dear Glaucon, to the previous 
argument; the prison-house is the world of sight, the light of the fire is the 
sun, and you will not misapprehend me if you interpret the journey upwards to 
be the ascent of the soul into the intellectual world according to my poor 
belief, which at your desire, I have expressed -- rightly or wrongly God knows. 

Score on a 0-6 scale awarding 1/2 point for each correctly inserted mark of punctuation and subtracting 1/4 
point for each incorrectly inserted mark of punctuation. Round to the nearest integer and award a 0 if the 
result is negative. 
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39, Key: 



Acceptable Responses: 

* and 

* as 

* for 

* since 

* while 



Unless otherwise specified, items should be scored as "1" (correct) or "O'* (incorrect). 

7. Write a Letter 

Sum the total niimber of **yes** responses, divide by 3, and round to the nearest whole number. A yes/no 
decision is made for each feature noted below. 

Information identifying the writer 

1. Gives the correct name: Pat Carson 

2. Gives the correct street address: 291 Westover Street 

3. Gives the correct city: Tyland 

4. Gives the correct state: CA or California 

5. Gives the correct zip code: 99499 

Information identifying the recipient 

6. Gives the correct name of company: Tyland Training Center 

7. Gives the correct address: Box 33S 

8. Gives the correct city: Tyland 

9. Gives the correct state: .CA or California 

10. Gives the correct zip code: 99499 

Date of letter 

n. Gives the date the letter is being written 

12. Places date in appropriate business letter position 

13. Writes an appropriate greeting for a business letter 

14. Punctuates the greeting according to business letter convention 

15. Capitalizes the greeting correctly 

16. Writes the greeting in an appropriate place 

Business Letter Closing 

17. Writes an appropriate closing for a business letter 

18. Punctuates the closing according to business letter convention 

19. Capitalizes the closing correctly 

20. Writes the closing in an appropriate place 



5. Construction 
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Rftferenc* to the advertisement 

21. Names the newspaper: Golden News 

22. Notes the dete of the advertisement: Month, dey, yeer 

23. States the positions that will be opening in the cetegories 
2A. Describes the terms of the employment accurately 

25. Notes the correct salary 

The purpose for writing 

26. States that he/she is applying for a position 

27. Identifies the category (categories) he/she is applying for 

The writer’s qualifications 

28 A - Gives some relevant facts or other beckground information about the writer’s qualifications for a 
position 

28 B - Gives substantial, relevant information about the writer’s qualifications for a position 

29 Gives additional information about the writer that may help persuade the recipient to accept the 
writer into the program 

Use of language 

30 Creates a respectful, business-’like tone 

31 A - Controls grammar and usage fairly well 

31 B - Controls grammar and usage very well 

32 A - Uses words accurately 

32 B - Uses words effectively 

33 Punctuates words correctly (e.g., uses epostrophes approprietely ) 

34 Capitalizes words correctly 

Control of sentence structure 

35 A - Generally forms simple sentences correctly 

35 B “ Generally forms simple and complex sentences correctly 

35 C - Varies sentence structure effectively (to convey meaning) 

36 A - Punctuates simple sentences correctly. 

36 B “ Puntuates simple sentences correctly end complex sentences fairly well. 

36 C “ Punctuates simple and complex sentences correctly. 

14. "Gorilla Sightings" has an 6 point scoring guide: 

Content : 4 points, one each for 

* date of census 

* location of gorillas 

* number of gorillas in 1981 

* number of gorillas in 1986 




65 



Writing: 4 points 

4 “ •rrorlsss ^ 4 . 

3 « i srror in gramaar. syntax, spalling, punctuation, word choica, or tha coordination of sentences 

(if more than one sentence is given) 

2 » 2 errors of sort described above 
1 “ 3 or more errors of the sort described above 
0 “ incoherent response, or not attempted 

A sample ”8“ response is: 

••A 1986 census recorded sightings of 280 gorillas in the Virunga Mountains of Rwanda, Zaire, and Uganda; 
this marks an increase from the 240 gorillas sighted in this same area in 1981. 



20. Write an announcement 



Score of 4 A Successful Message 

Gives all of the essential information 
Presents the information clearly and concisely 
Creates a positive tone 

Is generally free of intrusive errors in spelling, grantnar, and punctuation 



Score of 3 An Accurate Message 

Gives all of the essential information 

Presents the information in a way that makes no unnecessary demands on the reader, such as: 

Embedding essential information in irrelevant information 
Creating some confusion because of imprecise wording 
Formatting the information in an inefficient or disorganized way 
Having intrusive errors in spelling/grairmar/ punctuation 



Score of 2 A Fairly Accurate Message 

Presents most of the essential information 
States the information fairly clearly 



Score of 1 An Attempt to Convey a Message 

Presents some of the essential information 



Essential Information 



Who: Volunteers to help fix up Youth Center 

Mrs. Stone or Pat Carson may or may not be mentioned, as appropriate 

What: Fix roof and work on grounds 

Bring hammers, rakes, other appropriate tools 
Lunch provided 

Where: The Youth Center 

When: Saturday, March 21 8:30-12:00 

Why: To repair damage caused by storm 
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